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►7- ' In regression analysis, we employ contour projection (CP) to de- 

velop a new dimension reduction theory. Accordingly, we introduce 
00 ■ the notions of the central contour subspace and generalized contour 

subspace. We show that both of their structural dimensions are no 
larger than that of the central subspace Cook [Regression Graphics 
■ (1998b) Wiley]. Furthermore, we employ CP-sliced inverse regression, 

I CP-sliced average variance estimation and CP-directional regression 

to estimate the generalized contour subspace, and we subsequently 
' obtain their theoretical properties. Monte Carlo studies demonstrate 

, that the three CP-based dimension reduction methods outperform 

their corresponding non-CP approaches when the predictors have 
heavy-tailed elliptical distributions. An empirical example is also pre- 
sented to illustrate the usefulness of the CP method. 
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1. Introduction. In high-dimensional data analysis, Li (1991) proposed 
a method of effective dimension reduction and Cook (1998b) subsequently 
introduced the concept of sufficient dimension reduction. Their novel ap- 
proaches allow us to study low-dimensional regression relationships prior 
to model formulations. To effectively estimate the basis of a dimension re- 
duction subspace, various methods have been developed. They include, but 
^ are not limited to, sliced inverse regression (SIR) [Li (1991)], sliced average 

variance estimation (SAVE) [Cook and Weisberg (1991)], principal Hessian 
k> \ directions (PHD) [Li (1992) and Cook (1998a)], minimum average variance 

estimator (MAVE) [Xia et al. (2002)], contour regression (CR) [Li, Zha 

' 



Received April 2008; revised December 2008. 

1 Supported in part by National Natural Science Foundation of China Grant 10771006 
and a grant from Microsoft Research Asia. 

AMS 2000 subject classifications. Primary 62G08; secondary 62G35, 62G20. 

Key words and phrases. Central subspace, central contour subspace, contour projec- 
tion, directional regression, generalized contour subspace, kernel contour subspace, y/n- 
consistency, sliced average variance estimation, sliced inverse regression, sufficient contour 
subspace. 



This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Statistics, 
2009, Vol. 37, No. 6B, 3743-3778. This reprint differs from the original in 
pagination and typographic detail. 

1 



2 



R. LUO, H. WANG AND C.-L. TSAI 



and Chiaromonte (2005)], inverse regression estimation (IRE) [Cook and 
Ni (2005)], the Fourier method (Fourier) [Zhu and Zeng (2006)], directional 
regression (DR) [Li and Wang (2007)], a constructive approach [Xia (2007)], 
sliced regression (SR) [Wang and Xia (2008)] and also those methods based 
on higher-order moments [Yin and Cook (2002, 2003, 2004)]. 

One of the objectives of dimension reduction is to seek a central subspace 
(CS) [Cook (1994, 1998b)], which contains all information for the regression 
of response Y on predictor X. To estimate the CS, two technical conditions 
are commonly used: the linearity condition [Li (1991)] and the constant vari- 
ance condition [Cook and Weisberg (1991), Li (1992), Cook (1998a) and Li, 
Zha and Chiaromonte (2005)]. For example, SIR requires the linearity con- 
dition, while SAVE, PHD and DR entail both the linearity and constant 
variance conditions. It is known that the elliptically symmetric distribu- 
tion of X with a finite first moment implies the linearity condition [Li and 
Duan (1989)], and that the normality assumption of X ensures the constant 
variance condition [Cook and Weisberg (1991)]. To facilitate the use of di- 
mension reduction methods, Cook and Nachtsheim (1994) studied the role 
of elliptical symmetry in regression. In addition, they proposed a weighting 
procedure to achieve elliptically symmetric covariates. This motivates us 
to investigate dimension reduction methods via the elliptically symmetric 
assumption, which was also considered by Li, Zha and Chiaromonte (2005). 

In the class of elliptically symmetric distributions, some either have heavy- 
tailed behavior or do not have finite moments. Accordingly, many existing 
dimension reduction methods may not yield accurate estimators of the CS. 
Hence, Wang, Ni and Tsai (2008) proposed the contour projection (CP) ap- 
proach to project the covariate vector onto a unit contour. The resulting 
predictor vector has finite moments of every order and improves parameter 
estimators for heavy-tailed predictors. However, the theoretical properties 
of CP have not been thoroughly investigated. To this end, the aim of this 
paper is to establish a theoretical paradigm for contour projected dimension 
reduction. We introduce the notions of a central contour subspace (CCS) 
and a generalized contour subspace (GCS). Under appropriate conditions, 
the unique existence of the CCS and the GCS are established and their re- 
lationships with the CS are investigated. In addition, we show that their 
structural dimensions are no larger than that of the CS. Moreover, we ob- 
tain the theoretical properties of CP-sliced inverse regression (CP-SIR), CP- 
sliced average variance estimation (CP-SAVE), and CP-directional regres- 
sion (CP-DR), as well as study the population exhaustiveness of those three 
CP methods. Consequently, the CP approach not only possesses theoreti- 
cal justifications but also broadens the use of existing dimension reduction 
methods. 

The rest of this paper is organized as follows. Section 2 introduces contour 
projected dimension reduction. Section 3 investigates the population features 
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of CP-SIR and CP-SAVE, while Section 4 studies CP-DR. The sampling 
properties of CP methods are studied in Section 5. Extensive simulation 
experiments are reported in Section 6, and a real example is analyzed in 
Section 7. We conclude with a brief discussion in Section 8, and all technical 
details are left to the Appendix. 

2. Contour projection and sufficient dimension reduction. 

2.1. Sufficient dimension reduction. Let X = (X\, . . . ,X p ) T S MP be a 
p-dimensional predictor with p > 1 and Y € M 1 be the response of interest. 
To capture their regression relationship, we adopt the following commonly 
used dimension reduction model: 

(2.1) YALX\A T X, 

where the response Y is conditionally independent (_LL) of A given A T X with 
A £ M pxd and d < p. Let P A = A(A T A)~ 1 A T and Q A = I p - P A , where I p is 
the p-dimensional identity matrix. As a result, the model (2.1) is equivalent 
to Y JL X\P A X. For the sake of convenience, we use the generic notation 
S(H) to denote the linear subspace spanned by the column vectors of an 
arbitrary matrix H. We then refer to S(A) as the sufficient dimension re- 
duction (SDR) [Cook (1998b)] subspace. When A is a p x p full rank matrix, 
S(A) is automatically a SDR subspace. In practice, however, we are only 
interested in the "smallest" SDR subspace, which is typically defined to be 
the intersection of all SDR subspaces. If such an intersection is itself a SDR 
subspace, it is called the central subspace (CS) [Cook (1996, 1998b)]. Here- 
after, we always assume that the CS exists and is denoted by S y \ x . Next, we 
study the CS via CP. 

2.2. Contour projection. For statistical validity, inverse regression meth- 
ods commonly require the linearity condition of Li (1991), which assumes 
that E(X\b T X) is a linear function of b T X, where b S S y \ x is an arbitrary 
nonrandom direction. Because b is unknown in practice, it is sensible to re- 
quire that the linearity condition holds for any arbitrary direction b £ MP. As 
noted by Eaton (1986) and Cook and Nachtsheim (1994), such a requirement 
can only be satisfied by the so-called elliptically symmetric distribution. Its 
probability density function is given by [Muirhead (1982)] 

(2-2) f^(X) = \^ 1 / 2 f(\\X-p\\l), 

where p €M P is the location parameter, S € M pxp is the positive definite 
scatter matrix and \\X — fi\\^ = (X — p) T 'E~ 1 (X — p) is a Mahalanobis dis- 
tance. For the sake of identifiability, we require that tr(S) = p [Muirhead 
(1982)]. Without loss of generality, we also assume that E = I p and p = 0, 
which can be achieved by redefining X = E~ 1 / 2 (A — p). 
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As mentioned above, if X satisfies the linearity condition for any arbitrary 
direction b, the distribution of X must be elliptically symmetric. However, 
this is only valid when the finite moments of an elliptically symmetric distri- 
bution exist. To avoid the issue of the existence of finite moments, we adopt 
the method of Wang, Ni and Tsai (2008) and propose the following contour 
projection operation: 

(2.3) 1 = (;?!,..., A%) T = X/R, 

where R = \\X\\ and || • || is the typical L2 norm. As a result, X* is the 
contour projected predictor, which has finite moments of every order. It can 
be shown that the support of X~ is the unit contour {3? : || = 1}, as long 
as the support of X contains an open convex set that includes the origin as 
an interior point. Although Wang, Ni and Tsai (2008) employed CP in the 
context of inverse regression, the properties of contour projected dimension 
reduction have not been well studied yet. This motivates us to establish the 
theoretical foundation for CP in the subsequent sections. 

2.3. Central contour subspace. When X follows an elliptically symmetric 
distribution as defined in (2.2), Wang, Ni and Tsai (2008) noted that R and 
A* are mutually independent. Accordingly, we consider the following contour 
projected dimension reduction model: 

(2.4) Y 1L X*\B T j£ 

for some B 6 M px<i . We then label the resulting space S{B) a sufficient 
contour subspace (SCS) of Y\X. Adopting the CS concept of Cook (1996), 
Cook (1998b), we define the intersection of all SCSs as the kernel contour 
subspace (KCS) and denote it by IC y ^. If IC y ^ itself is also a SCS, we call 
it the central contour subspace (CCS) and denote it by C y ^. 

Under mild yet reasonable conditions [Cook (1998b)], the CS can be well 
defined as the intersection of all SDR subspaces. However, in the CP context, 
one can easily construct examples such that the KCS is not a SCS, and hence 
the CCS does not exist. Consider the following example: 

Example 1. 

(2.5) Y = ib X j+ e = R2 (ib^i) +e = R 2 {l-l\)+e, 

j=2 \j=2 J 

where p > 2 and X follows an elliptically symmetric distribution. Note that 
e in (2.5) and hereafter satisfies e _U_ X* . Let e,- G MP denote a p-dimensional 
vector with its jth component being 1 and others 0. Then, the second equal- 
ity in (2.5) results in one SCS, S a = 5(e2, . . . , e p ), while the third equality 
yields another SCS, Sb = S(e\). Nevertheless, S a n 5j, = is an empty set. 
Thus, the CCS does not exist. A similar example was also constructed by 
Cook (1994). 
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Example 1 indicates that the CCS may not be well defined if the regression 
relationship is symmetric. This motivates us to present the following three 
definitions so that we can assess the existence of CCS. 

Definition 1. We define Y\X to be dimension reducible if (2.1) holds 
for some d<p. Otherwise, Y\X is dimension irreducible. Analogously, we 
define Y\X as dimension reducible if (2.4) holds for some d<p. Otherwise, 
is dimension irreducible. 

If Y\X is dimension reducible, then (2.1) holds for d<p. By Lemma 2 of 
Wang, Ni and Tsai (2008), we have Y JL X\A T X*, which implies that Y\Jt 
is also dimension reducible. As a result, if Y\X is dimension irreducible, 
then Y\X is dimension irreducible. However, the reverse is not true. This 
indicates that the CP method might provide a better dimension reduction 
than SDR. To this end, we next define contour symmetric. 

Definition 2. Let G y (jt = it) = P(Y < y\ X = ~ot), and assume that 
is dimension reducible. We then term Y\X contour symmetric on di- 
rections in S(Bi), with B\ € M pxrfl and 1 < d\ < p, if it satisfies 

(2.6) G y (l = 1?) = Gj,(||P Bl ^|| = \\P Bl lt\\,P B2 X = P Bi lt), 

where it € M. p is an arbitrary vector satisfying ||~af || = 1, B>2 £ MP xd2 for some 
< d 2 < p satisfying the conditions S(Bi)nS(B 2 ) = 0, S(Bi)l)S(B 2 ) + 
and Gy(}£ = lt) is a nondegenerate function in ||P Bl af ||. 

Equation (2.6) implies that S{B\,B 2 ) is a SCS. Thus, we only need to 
focus on the symmetric directions in SCSs. For the sake of convenience, we 
require that 5 (Pi) and S{B 2 ) do not have any overlap. Otherwise, one can 
redefine B\ = (I p — Pb 2 )Bi so that (2.6) is still valid. One might wonder 
why we impose the constraint S(Bi) U 5(^2) ^M p . Consider an arbitrary 
dimension reducible Y\X* , and assume that one of its SCSs is given by S(B) 
with dim{5(i?)} < p, where dim{-} stands for the dimension of a linear 
subspace. If we do not impose the constraint S(Bi) U S(B 2 ) 7^ M p , then we 
can define B 2 = B and B\ as a basis of linear subspace whose orthogonal 
complement is S(B). Accordingly, ||Pb 1 A^|| 2 = 1 — ||Pb 2 A^|| 2 . This together 
with the assumption of S(B 2 ) being a SCS implies that 

Gy(l = lt)= Gy{P B2 l = P B2 lt) 

(2.7) = G y (\\P B2 l\\ = \\P B2 ^\\,P B2 1 = P B2 lt) 

= Gy{\\P Bl l\\ = ||P Bl -^||,P Ba ^ = P B2 lt). 

Hence, condition (2.6) is satisfied for any dimension reducible Y\jt . Conse- 
quently, Definition 2 loses its ability to characterize Y|]?'s nontrivial sym- 
metric structure. Therefore, requiring S(Bi) US(B 2 ) / M p is essential and 
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necessary. In contrast to contour symmetric, we define contour asymmetric 
as given below. 

Definition 3. We call Y\X contour asymmetric if Y\X is not contour 
symmetric on any possible linear subspace S(B\) C MP with < dim{S(Bi)} < 
p. 

Definition 3 leads to the next theorem for the existence of the CCS. 

Theorem 1. The central contour subspace, C y \-g, exists uniquely if and 
only ifY\)t is contour asymmetric, that is, JC y \-^ = C y \-g. 

The above theorem provides a necessary and sufficient condition for the 
unique existence of the CCS, that is, the regression relationship Y\X must 
be contour asymmetric. This naturally raises an interesting question: what 
happens if Y\X* is not contour asymmetric! This critical question is ad- 
dressed in the next subsection. 

2.4. Generalized contour subspace. For the existence of the CCS, Theo- 
rem 1 requires a strong condition that is likely to be violated under some 
symmetric situations; see Example 1 in (2.5). This motivates us to rede- 
fine the "smallest" SCS as the SCS with the smallest structural dimension, 
rather than the intersection of all SCSs (i.e., the KCS KL y \-g). The resulting 
space is called the generalized contour subspace (GCS), denoted by Q y \-^. 
Because W itself is a SCS and a SCS's dimension must be a positive in- 
teger no larger than p, there exists at least one GCS, and its dimension is 
unique (denoted by do). It is noteworthy that fC y ^ C G y \ but K. y \-g is not 
guaranteed to be a SCS. 

The existence of a GCS does not guarantee its uniqueness, so we need to 
find a reasonable condition to ensure the uniqueness of the GCS. To gain 
some insights, we consider the following example: 

Example 2. 

(2.8) Y = Jt 1 + % + X% + e = J?i + [l - 1\ - £ Jtfj + e. 

One can easily verify that there exist at least two different SCSs, whose inter- 
section is not a SCS. For example, the first equality yields Q a = S{e\,e2,e^), 
while the second equality results in Qi = S(e\, . . . , e p ). However, their 
intersection Q a n Gb = «5(ei) is not a SCS. If we assume that the GCS is 
not uniquely defined, then it is natural to have both Q a and Qi be GCSs, 
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which implies that dim(G a ) = dim(^). Because JC y i-& =G a ^Gb = <S(ei), and 
dim(^ a ) = dim(^b), we obtain 

(2.9) dim(£ a \ K y \^) = dim(g 6 \ lC y \-^). 
This leads to p = 5, so we have 

(2.10) (Q a \ K y \Tf) U (G b \ Kyfr) = (R p \ Ky^). 

Both (2.9) and (2.10) together imply a necessary condition for the existence 
of multiple GCSs. This necessary condition is 

dim(g a \^) = |dim(Rf\^) 

(2.11) 

do = \{p + dmi{lC y \^)} . 

Example 2 motivates us to find a sufficient condition for the uniqueness 
of the GCS. In most applications, the structural dimension do is expected 
to be much smaller than the predictor dimension [Chiaromonte, Cook and 
Li (2002)]. Thus, a very typical violation of equality (2.11) is do <p/2 < 
{p + dim(/C J/ |-^)}/2. Under such a condition, the uniqueness of the GCS can 
be rigorously established. 

Theorem 2. If there exists at least one SCS with structural dimension 
d <{p + dim(/C y | li f)}/2, then the GCS of Y\l£ is unique. In addition, if 
Cy\-& exists, then = C y \-£ and Q y \-g is unique. 

The above theorem indicates that the GCS can be well defined under a 
rather mild condition. It also shows that the existence of the CCS implies 
that of the GCS. This raises another interesting question: what is the re- 
lationship between the GCS and the CS [Cook (1996) and Cook (1998b)]? 
This issue is addressed below. 

Theorem 3. The relationship between the sufficient contour subspace 
and the CS is such that: (1) K, y \-g C S y \ x ; and (2) dim^i-^) < dim^u). 

Theorem 3 shows that the KCS is a subspace of the CS and the dimension 
of the GCS cannot be larger than that of the CS. In addition, Lemma 2 of 
Wang, Ni and Tsai (2008) indicates that the CS must be a SCS. 

To further explore the relationship, one might wonder whether the GCS 
must be the CS. We can easily verify with the following example that this 
is not necessary. 

Example 3. 



(2.12) 



Y= \\Xf +e = R 2 + e. 
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The first equality of Example 3 demonstrates that the dimension of the CS 
is the same as the predictor dimension p. Thus, Y\X is dimension irreducible. 
However, the second equality indicates that the dimension of the GCS is only 
0, which represents a substantial dimension reduction. Accordingly, the GCS 
is not the CS. 

Since the structural dimension of the GCS is never larger than that of 
the CS, one might question whether the GCS is always a subspace of the 
CS. The answer is negative, which is illustrated by Example 1. As one can 
see, the first equality of (2.5) implies that the CS is S y \ x = S(e2, ■ ■ ■ ,e p ), 
while the third equality of (2.5) indicates that G y \-g = S{e\). As a result, the 
intersection of the GCS and the CS is an empty set, which means GCS ^ CS 
in this example. 

In summary, Theorem 3, together with Examples 1 and 3, indicates that 
the GCS is closely related to but not exactly the same as the CS. Most 
importantly, the structural dimension of the GCS is guaranteed to be no 
larger, but might be much smaller, than that of the CS. Finally, one might 
wonder when we can have G y \x* = S y \ x . To this end, the following theorem 
provides a sufficient condition. 

Theorem 4. Assume that Y\X is dimension reducible. IfY\X* is con- 
tour asymmetric or dim(S y \ x ) < {p + dim(K, y \- x >)}/2, then G y \-g=S y \ x . 

Aforementionedly, in most applications, the structural dimension &\m.{S y \ x ) 
is expected to be much smaller than the predictor dimension [Chiaromonte, 
Cook and Li (2002)]. Thus, the condition dim^^,) < {p + dim(/C y |^)}/2 is 
easily satisfied as long as we have d\m(S y \ x ) <p/2. As a result, the techni- 
cal condition entailed by Theorem 4 [i.e., dim^^) < {p + dim(/C 2/ |- 2 >)}/2] is 
rather mild, which implies that the GCS is usually equal to the CS in prac- 
tice. However, this condition is sufficient but not necessary. See the following 
counter example. 

Example 4. 

Y = \Xi\ + |A 2 |+e = J R(|A* 1 | + \l 2 \) + e, 

where X € M 4 . In this example, we have lC y \-g = 0- In addition, dim(iSyu) = 
2 > {p + dim(/C ?/ |^)}/2 = 2, whereas S y \ x = G y \^ = S(e 1 ,e 2 ). 

In this section, we have established the foundations of the CP, explored 
the properties of the GCS and the CCS, and made connections between the 
GCS and the CS. To facilitate the use of the GCS in dimension reduction, 
we now turn to studying the properties of inverse regression methods via 
CP in the next two sections. 
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3. Contour projected SIR and SAVE. To improve dimension reduction, 
Wang, Ni and Tsai (2008) employed SIR and SAVE on the CS estimation 
via the contour projected predictor J? . For the sake of simplicity, we refer 
to both methods as CP-SIR and CP-SAVE, respectively. Although Wang, 
Ni and Tsai (2008) investigated some connections between the CP approach 
and the CS, they failed to solve the identifiability problem due to contour 
symmetry. In addition, CP is more directly related to the GCS than the CS. 
These findings motivate us to establish the relationships between the linear 
subspaces generated by the kernel matrices of CP-SIR and CP-SAVE with 
the GCS, respectively. 

3.1. The CP-SIR method. In the presence of finite moments, SIR em- 
ploys the kernel matrix cov{E(X\Y)} . This naturally motivates a CP-SIR 
method with the kernel matrix cov{£(A^|Y)}; see Wang, Ni and Tsai (2008). 
Then, its statistical properties can be studied as follows. 

Lemma 1. Assume that X has the density function in (2.2). We then 
have 

Lemma 1 implies that CP-SIR is able to estimate a portion of the KCS 
lC y \-g, which is a subspace of the GCS. By definition, we know that K. y \-g = 
r\s(B)ejS(B)i where S(B) is a SCS and I is the set of all SCSs. In addition, 
we can decompose S(B) as S(B A ) US(B S ) = S(B) with S(B A )f]S(B s ) = 
and 

(3.1) G y (X~ = -&) = G y {\\P Bs l\\ = \\P Bs ^\\,Pb a ^ = Pb a ^) 

for some B A and B$- It is noteworthy that such a decomposition always 
exists. This is because we can always set S(B S ) = and S(B A ) to be an 
arbitrary SCS. However, such a decomposition is not unique. Consider, for 
example: 

Example 5. 

Y = \Xi\ + | X 2 \ + A 3 + e = R(\ li\ + I ^2 1 + ^s) + e, 

where X = (Xi, X2, A3) G M 3 . Here, we can decompose S(B) as {S(Bg) = 
0,S(B\) = 5(ei, e 2 , e 3 )} (or {S(B 2 S ) = S(e{),S(B%) = S(e 2 ,e 3 )}, or {S(B 3 S ) = 
S(e2),S(B"^) =«S(ei,e3)}). Thus, the decomposition (3.1) is not unique. In 
contrast, JC y ^ = S(e$) = S(B\)r\S(B A )f]S(B A ) is unique, which motivates 
us to further characterize the properties of the KCS given below. 
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Lemma 2. Assume X has the density function in (2.2). S{B) and S{Ba) 
are defined as above. Then: (1) K, y \-g =(~)s(B)€1'S(Ba); (2) IfY\X is con- 
tour symmetric on some direction, then K, y \-^ / QyYt', (3) IfY\X is contour 
asymmetric, then JC y ^ = G y \^g. 

Lemma 2(1) shows that the KCS excludes all symmetric directions in the 
SCS. Accordingly, if Y\ X* is contour symmetric on some direction, then the 
KCS is not a SCS, and hence cannot be a GCS. This is the result of Lemma 
2(2). In addition, Lemma 2(3) indicates that the KCS becomes a GCS if 
Y\X is contour asymmetric. As a result, the KCS becomes the CCS (see 
Theorem 1). This finding bridges S{E(X*\Y)} and the KCS by introducing 
the following assumption. 

Assumption 1. For any v e !C y \-g, v^O, E(v T Jt\Y) is nondegenerate. 

Assumption 1 is similar to the mild requirement given in Li and Wang 
(2007) and Shao, Cook and Weisberg (2007). Under this assumption and the 
contour asymmetric condition, we establish the population exhaustiveness 
of CP-SIR, given below. 

Theorem 5. Assume X has the density function in (2.2). If Assump- 
tion 1 holds and Y\J? is contour asymmetric, then S{E(X*\Y)} = G y \-g. 

To facilitate the use of CP, we next study CP-SAVE. 

3.2. The CP-SAVE method. As demonstrated in the previous subsec- 
tion, CP-SIR fails if the regression relationship is symmetric. Under such a 
situation, SAVE [Cook and Weisberg (1991)] may provide a better approach 
to dimension reduction. Thus, it is of great interest to explore the usefulness 
of SAVE with contour projected predictors. 

For the sake of simplicity, we denote G y \i? = •S(Bq), where Bo £ RP xrf ° is an 
orthonormal basis. Let t(Y) = {1 - X(Y)}/(p - d ) and X(Y) = E(\\Bj~X*\\ 2 \Y) 

Then, it can be verified that Qb E(J?X' t \Y)Qb = t(Y)Qb , and the es- 
timator of t(Y) is given in Section 5.4. This motivates us to consider the 
CP-SAVE kernel matrix, M SA ve = E{[r(Y)I p - E(H T \Y)] 2 }, where 
E(X*X* T \Y) =cov(A*|y). The following lemma shows that CP-SAVE en- 
ables us to estimate a portion of the GCS. 

Lemma 3. Under (2.2), S{t{Y)I p - E{£x' T \Y)} C G^. 

It is known that the traditional SAVE method is able to estimate a por- 
tion of the CS under both the linearity condition and the constant variance 
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condition. Lemma 3 demonstrates that CP-SAVE can estimate a portion 
of the GCS when the predictor is elliptically symmetric distributed. It is 
remarkable that the constant variance condition is no longer needed here. 
Furthermore, as noted by Cook and Lee (1999) and Li and Wang (2007), 
SAVE estimates the CS exhaustively under some reasonable conditions. This 
motivates us to study whether CP-SAVE can estimate the GCS exhaustively. 
To this end, we need the following assumption. 

Assumption 2 . Let (3qi be the ith component of Bq and w = (wi , . . . , Wd ) 
be an arbitrary do x 1 nonzero vector, where Bq is defined as above. We 
then assume that 5Z*=i w i4'{^ Ah) is a nondegenerate random variable with 
<KY,0 Oi )=E{($g i ]t)*\Y}. 

Assumption 2 is valid only if Y\X~ is dimension reducible. Otherwise, we 
have 2~Z?=i ^iXifloi) = 1> which clearly violates Assumption 2. A similar as- 
sumption on the predictor X can be found in Li and Wang (2007) and Shao, 
Cook and Weisberg (2007). With the help of Assumption 2, the population 
exhaustiveness of CP-SAVE can be established. 

Theorem 6. Assume X has the density function in (2.2), and Y\X is 
dimension reducible. Then, under Assumption 2, we have 

S{T{Y)I p -E{ll T \Y)} = G y \ lt . 

In addition to CP-SIR and CP-SAVE, we further extend the CP approach 
to the dimension reduction method proposed by Li and Wang (2007) in the 
next section. 

4. Contour projected DR. 

4.1. Motivation. Both SIR and SAVE are commonly used dimension re- 
duction estimators. However, SIR fails to provide exhaustive estimation for 
symmetric regressions [Li (1991) and Cook and Weisberg (1991)]. Although 
SAVE is able to estimate the CS exhaustively [Cook and Lee (1999) and Li 
and Wang (2007)], its estimation efficiency is relatively poor. To mitigate 
the weaknesses and enhance the strengths of SIR and SAVE, Li and Wang 
(2007) recently proposed a novel method, directional regression (DR), for 
dimension reduction. Specifically, they suggested the kernel matrix of DR to 
be E{2I p - A°(Y,Y*)} 2 , where A°(Y,Y*) = E{(X - X*)(X - X*) T \Y,Y*}, 
and (X* ,Y*) is an independent copy of (X,Y). Moreover, Li and Wang 
(2007) demonstrated that this matrix is closely related to that of SIR and 
SAVE in a very interesting yet effective manner. 
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The major advantages of DR are: (i) DR achieves higher estimation ef- 
ficiency, and requires fewer computations; (ii) DR is able to estimate the 
CS exhaustively. These nice properties motivate us to propose a dimension 
reduction method of contour projected directional regression (CP-DR) that 
synthesizes the strengths from both the CP and DR approaches. To this end, 
we adopt Li and Wang's (2007) approach to consider the following kernel 
matrix for CP-DR: 

M DR = E[{t{Y) + t(Y*)}I p - A(Y, Y*)} 2 , 

where A(Y,Y*) = E{(jt — — jt*) T \Y,Y*}, and (^*,Y*) is an in- 

dependent copy of (X,Y). Analogously to Li and Wang (2007), we refer 
to X — X~* as the contour projected empirical direction. Note that, due to 
the absence of the constant variance condition, we replace the constant 2 in 
the DR kernel matrix with {t(Y) + t(Y*)} to constitute the CP-DR kernel 
matrix. 

The CP-DR dimension reduction method inherits all nice properties from 
DR. Furthermore, CP-DR enables us to handle heavy-tailed predictor distri- 
butions. Finally, CP-DR has the potential to produce a dimension reduction 
subspace with a much smaller structural dimension than that of DR (see Ex- 
ample 3). 

4.2. Population exhaustiveness. Similar to CP-SIR and CP-SAVE, we 
present the following result to assure that CP-DR estimates a portion of the 
GCS. 

Theorem 7. Under (2.2), S{[t{Y) + t{Y*)]I p - A(Y,Y*)} c G y ^. 

Because DR estimates the CS exhaustively, it is of great interest to show 
that CP-DR can also estimate the GCS exhaustively. 

Theorem 8. Assume that either: (1) Assumption 1 holds and Y\X* is 
contour asymmetric; or (2) Assumption 2 holds and Y\X* is dimension re- 
ducible. Under (2.2), we then have 5(Mdr) = Q y \-^- 

Theorem 8 indicates that CP-DR estimates the GCS exhaustively with- 
out employing the constant variance condition, which is used by various 
exhaustive methods [Li, Zha and Chiaromonte (2005), Zhu and Zeng (2006) 
and Li and Wang (2007)]. 

4.3. A simplified formulation. To reduce the computations necessary for 
CP-DR, we simplify the analytical form of the kernel matrix Mdr in the 
next theorem. 
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Theorem 9. The matrix Mdr can be expressed as 

M DR = 2[E{t\Y)}I p + E{E\11 T \Y)} + E 2 {E(l\Y)E(jt T \Y)} 
(4.1) + E{E(Jt T \Y)E(l\Y)}E{E(l\Y)E(l T \Y)} 

-2E{r(Y)E(H T \Y)}]. 

To ease interpretation, one can rewrite (4.1) as 

M DR = 2[E{E 2 [t(Y)I p - 11 T \Y]} + E 2 {E(1\Y)E(1 T \Y)} 

+ E{E(x' T \Y)E(X'\Y)}E{E(X'\Y)E(l T \Y)}}. 

Thus, it is a natural combination of the kernel matrices of CP-SAVE and 
CP-SIR. According to Theorem 9, we are able to estimate Q y \-g as long as we 
can estimate t(Y), E(J?J? t \Y) and E[X \Y) consistently. The parameter 
estimators and their properties are discussed in the next section. 

5. The sampling properties. 

5.1. The estimators of \x, £ and ~xf. Without loss of generality, we the- 
oretically assume that \x = and £ = I p (see Section 2.2). In practice, how- 
ever, both fj, and £ are often unknown and have to be estimated from the 
data. It is noteworthy that both the distributions of X and X are ellipti- 
cally symmetric with the same contour shapes. Consequently, they share the 
same scatter matrix £ via the identifiable constraint tr(£) = p. Thus, the 
estimated scatter matrix of X can be used to estimate that of X; see Tyler 
(1987). 

More specifically, let (yi,Xi) be the observation collected from the ith 
subject (1 < i < n), where yi € M 1 is the response and Xi = (xn, . . . , Xi p ) T G 
W is a p x 1 predictor vector. To estimate \i and £, we follow the method 
of Tyler (1987) and define fi^ = (fix , ■ ■ ■ , /4°^) T , where fij is the median 
of {xij :i = 1, . . . , n} for every 1 < j < p. As a result, is -y/n-consistent. 
Next, we estimate £ and /i by iterating the following two equations [Tyler 
(1987)]: 

(5.1) £( m+ ) oc n J H„9 ' 

i=l H Xi ^ lls(m) 

and 

( 5 - 2 ) » {m+1) =( n ~ l in — i — ) Y^Eii — 4 — \ 
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where || ■ ||s is the Mahalanobis norm as defined in Section 2.2 and (fi^ , E^ m )) 
is the estimator of (fi, S) obtained in the mth step. We iterate the proce- 
dures of (5.1) and (5.2) until they converge, and denote the resulting es- 
timator by (/2, £). For the sake of simplicity, one can fix A (m) = A (0) 

(for 

every m > 1) without iterating (5.2). The asymptotic efficiency of S is not 
affected as long as the predictor distribution is elliptically symmetric. Then, 
by Tyler's (1987) Theorem 2.2, this iterating process is guaranteed to con- 
verge computationally with probability 1. Furthermore, by Tyler's (1987) 
Theorem 4.2, the estimator £ is -^/n-consistent. In other words, we have 
|| £ — S|| = O p (n -1 / 2 ), where \\H\\ is defined to be the maximum of the ab- 
solute singular value of a matrix H, and it becomes the usual Li norm if H 
is a vector. Using fi and E, we obtain the estimator of the contour projected 
predictor, H?i = S~ 1 / 2 (xj — p)/\\xi — Alls- 

5.2. A preliminary result. To establish the ^/n-consistency of the three 
CP estimators, we first introduce a technical assumption and then present 
one preliminary result. 

Assumption 3. Assume \\X\\ 2 has a continuous distribution with a 
probability density function h(-). We further assume that there exist con- 
stants a > 1 and C a > such that t~ a h(t) — > C a as t — > 0. 

This is a very mild yet reasonable assumption. For example, if X follows 
a standard normal distribution, then ||A"|| 2 is a chi-square distribution with 
p degrees of freedom. Thus, Assumption 3 is satisfied with a =p/2 — 1 as 
long as p > 4. Analogously, if X follows a multivariate i-distribution with 
df degrees of freedom, then ||A|| 2 is a F-distribution with the degrees of 
freedom (p,df). Once again, Assumption 3 is satisfied with a =p/2 — 1 
as long as p > 4. Applying the above assumption, we obtain the following 
preliminary result. 

Theorem 10. Under Assumption 3, we have: (i) £7||X|| < oo. Fur- 
thermore, assume that p,E.W is an arbitrary random vector satisfying p, = 
Op(n- 1/2 ), andteW x P is an arbitrary random matrix satisfying S p — > I p . 
We then have: 



(ii) 



Mill 
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> P 0. 



Theorem 10 (i) indicates that ||X|| 4 has a finite moment, which plays 
an important role in ensuring the -y/n-consistency of S; see Theorem 4.2 of 
Tyler (1987). Hence, it will be useful for us to show the y^n-consistency of 
the CP estimators. In addition, Theorem 10(ii) allows us to replace \\xi — 
jl\\f, by ||»i||f;, which simplifies the theoretical proof of y^-consistency; see 
Appendix A. 14 for the details. 
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5.3. \fn- consistency of CP-SIR. Without loss of generality, we assume 
that Y is discrete and has a finite support {1, . . . ,K} with K > 2 [see Li 
(1991) and Cook and Ni (2005)]. Next, we define z ik = 1 if Y = k and oth- 
erwise 0. Then, n k = J2i z ik is the number of observations falling into the 
fcth slice. Under this setting, the kernel matrix of CP-SIR can be defined 
as Msm = cov{E(l\Y)} = T sm Tj m , where T sm = {E{1\Y = 1)^T, . . . , 

E(jt\Y = K)^} G W xK and p k = P(Y = k) = E(z ik ). Thus, it is natural 
to estimate M S tr by M S ir = f SIR f T IR , where f sir = (xiVpi, • • -,xkVpk), 
Pk — n k/ n , and x k = n k ~ 1 J2i~3?i z ik- In the following theorem, we show that 
Msm is y^'Consistent. 

Theorem 11. Under Assumption 3, we have x k — E{J«t\Y = k} = 

The above theorem together with p k — p k = O p (n _1//2 ) implies ||MsiR — 
Afgntll =Op(?i~ 1 / 2 ). Hence, CP-SIR achieves y^-consistency. 

5.4. y/n- consistency of CP-SAVE and CP-DR. Under the assumption 
that Y is discrete, the kernel matrix of CP-SAVE given by Msave = E[t(Y)I p — 
E(l x 1 T \Y)] 2 in Section 3.2 can be written as Msave = J2kPkE[T k I p — 
E(j£)t T \Y = k)] 2 , where r k = t(Y = k). For the given data, we can estimate 
this kernel matrix by M S ave = J2 k Pk{T k Ip-^k) 2 , where E fc = n^ 1 J2i ~&i~&Jzik 
and f k is the median of S^'s eigenvalues. The reason for using the median of 
Sfc's eigenvalues to estimate T k is as follows. By Theorem 6, we know that 
most of the eigenvalues of = E(3tJ? T \Y = k) are equal to T k , except for 
those eigenvalues associated with G y \-g- As a result, the median of Sfc's eigen- 
values is t(Y) whenever do < p/2. Applying similar techniques to those used 
in the proof of Theorem 11, we are able to show that \\Ti k — T, k \\ = O p (n -1 / 2 ). 
Consequently, we have f k — r k = O p (n -1 / 2 ); see Eaton and Tyler (1994). 
Therefore, ||M S ave - MsaveII = O p (n~ 1 / 2 ), which implies that CP-SAVE is 
-^/n-consistent. Furthermore, by Theorem 9, we find that Mdr is closely re- 
lated to those of CP-SIR and CP-SAVE. Therefore, Mdr can be estimated 
analogously, and the resulting estimator is also y'n-consistent. 

5.5. The estimation of structural dimension. In this subsection, we pro- 
pose an informal but effective method for determining the structural di- 
mension. For the sake of convenience, we use the generic notation M and 
M to represent a kernel matrix and its consistent estimator, respectively. 
In addition, we assume that the structure dimension of M is do, that is, 
Xj > for any j < do but Xj = for j > do, where Xj is the j th largest eigen- 
value of M. Moreover, we define f,- = Aj/A,+i for 1 < j < p — 1. Intuitively, 
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if j < do, both Xj and Xj+i converge in probability to positive constants. 
Thus, we have fj = O p (l) for any j < do- On the other hand, if j > do, both 
Xj and Aj_|_i converge in probability to 0. We then assume that both Xj and 
A J+ i share the same convergence speed [this assumption is indeed valid if 
^fn{M — M) is asymptotically normal; see Eaton and Tyler (1994)]. Under 
this assumption, we also have fj = O p (l) for any j > do. However, if j = do, 
then fj — > oo. This is because Xd — ► A^ > but Ad +i — ► 0. Consequently, 
one can estimate do by d = argmaxi<j<^ max fj, where <i max is the maximum 
dimension given a priori. In practice, we recommend using d max = 5, which 
is large enough for the purpose of dimension reduction. We refer to this 
estimation method as Maximal Eigenvalue Ratio Criterion (MERC), and 
simulation studies in Section 6.3 suggest that such a simple method works 
fairly well. 

6. Monte Carlo studies. 

6.1. Simulation settings. To evaluate the finite sample performance of 
CP methods, we conducted extensive Monte Carlo simulations. In these 
studies, X = (X T , e) T were independently generated from Wf J V^j / df , where 

W £ W +1 is a (p+ 1) -dimensional standard normal random variable, V^f is 
a chi-squared random variable with degrees of freedom df , and Vdf is in- 
dependent of W. As a result, X follows a multivariate t distribution with 
df degrees of freedom [see Lange, Little and Taylor (1989)]. In addition, 
the marginal distributions of X and e are p-multivariate t distribution and 
univariate t distribution, respectively [see Muirhead's (1982)]. Thus, the 
probability density function of X has the form of (2.2). We then simulated 

Y = 9{Bo^, ||^||,e), where B = (/3qi, • • • ,Pod ) T and S(S '> are the P re " 
specified functions of each model given in the next subsection. Under this 
setting, one can show that X* is independent of (||X||,e), which leads to 

Y JL jl\Bjjt . It is noteworthy that X and e are not independent, except 
when X follows a normal distribution (i.e., df = oo). 

There are five models considered in our simulation studies. For each 
model, we simulated 500 data sets. In addition, the number of covariates 
is p = 20 and the number of slices is H = 5. Moreover, four degrees of free- 
doms are examined, namely df = 1,3,5 and oo. They represent the case that 
moments do not exist, the first moment exists, the second moment exists, 
and the distribution is normal, respectively. For each model, three differ- 
ent inverse regression methods (i.e., SIR, SAVE and DR) and their corre- 
sponding contour projected approaches are compared. To evaluate the ac- 
curacy of the estimate B, we adopt Li and Wang's (2007) distance measure, 
A(Bq,B) = tr{(Peo — Pb)(Pbq ~ ^b^}/^ - ^ smauer value of A indicates a 
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better estimate. For the sake of simplicity, we slightly abuse notation by us- 
ing Bq = (Pqi, ■ ■ ■ ,{3od ) T t° commonly represent the true parameter matrix 
for each of the five models. 

6.2. Estimation accuracy. To evaluate the estimation accuracy of the 
proposed CP methods, we present the detailed structures of the five models 
given below: 

(I) Linear conditional mean model [Li (1991) and Ni, Cook and 
Tsai (2005)], Y = + 0.5e = x \\X\\ + 0.5e, where /3 i = 

(1,1,1,0,...,0) T GP and d = l. 
(II) Symmetric conditional mean model [Li, Zha and Chiaromonte 
(2005)], Y = {^i? + PJ 2 X 2 + 0.2e = {(i^i? x ||X|| 2 + (3j 2 l 2 x 
|| JSC || +0.2e, where fa = (1,0,..., 0) T G M p , (3 2 = (0,1,..., 0) T G W 
and do = 2. 

(III) Discrete response model [Zhu and Zeng (2006)], Y = I(J3q-]_X + 
0.2e > 0)+2J( / S T 2 X+0.2e > 0) = I^jt x ||X||+0.2e > 0)+2I(/3 T 2 ^ x 
\\X\\ + 0.2e > 0), where /(•) denotes the indicator function, Pqj G M p 
(j = 1,2), and do = 2. The first four components of /3qi and the sev- 
enth to tenth components of /?02 are taken to be 1, while the rest of 
the components of /3qi and /?02 are fixed to be 0. 

(IV) Heterogeneous variance model [Li, Zha and Chiaromonte (2005)], 
Y = 0.5(/3 T 1 X- 0.5) 2 e = 0.5(/3 T 1 ^||X|| - 0.5) 2 e, where (3 01 = (1,0,..., 
0) T el" and d = 1. 

(V) Contour symmetric model, Y = (fi^X) 2 x ||X||~ 2 + 0.2e = ((5q X x 
J?) 2 + 0.2e, where /3 i = (1,1,0,..., 0) T G W and d = 1. 

It is noteworthy that GCS = CS in models I to IV, while GCS ^ CS in 
model V. We consider three sample sizes (n = 200, 400 and 1000) in Monte 
Carlo studies. To save space, Table 1 only reports the results with n = 400. 
We find that the performances of the three non-CP methods deteriorates 
seriously as the tail of the predictor distribution becomes heavier. However, 
the CP-methods (particularly the CP-DR method) perform satisfactorily 
across all simulation settings. 

6.3. Dimension determination. Employing models I-V discussed in the 
previous subsection, we study the finite sample performance of MERC. Be- 
cause the three CP methods and the four dfs yield qualitatively similar 
findings, we only report the results of CP-DR with df = 3. Table 2 indicates 
that the percentage of d = do steadily approaches 100% as the sample size in- 
creases. Consequently, MERC is a simple and effective method to determine 
the structural dimension in large samples. 
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Table 1 

The average of A(Bo, B) for models I-V with n = 400 



Model 


df 

j 


CP-DR 


DR 


CP-SIR 


SIR 


CP-SAVE 


SAVE 


I 


oo 


0.024 


0.023 


0.020 


0.020 


0.096 


0.096 




5 


0.030 


0.062 


0.026 


0.058 


0.130 


0.810 




3 


0.033 


0.517 


0.028 


0.137 


0.158 


1.693 




1 


0.580 


1.859 


0.076 


0.800 


1.238 


1.817 


tt 
11 


oo 


n i of; 
U.1ZD 


U.loU 


U.989 


U.»y4 


U.olo 


n q i a 
U.oM 




5 


0.160 


0.678 


0.992 


1.030 


0.448 


0.965 




3 


0.183 


1.132 


0.993 


1.084 


0.537 


1.172 




1 


0.828 


1.526 


1.035 


1.492 


1.130 


1.528 


III 


oo 


0.169 


0.177 


0.192 


0.194 


0.650 


0.699 




5 


0.165 


0.678 


0.185 


0.233 


0.643 


1.427 




3 


0.172 


1.294 


0.190 


0.316 


0.639 


1.674 




1 


0.591 


1.772 


0.214 


0.973 


1.087 


1.789 


IV 


oo 


0.238 


0.244 


0.473 


0.513 


0.397 


0.370 




5 


0.317 


1.198 


0.600 


0.925 


0.495 


1.256 




3 


0.409 


1.719 


0.736 


1.334 


0.573 


1.726 




1 


1.534 


1.878 


1.252 


1.880 


1.587 


1.879 


V 


oo 


0.302 


0.341 


1.897 


1.900 


0.302 


0.341 




5 


0.401 


1.575 


1.894 


1.894 


0.401 


1.575 




3 


0.464 


1.839 


1.891 


1.886 


0.463 


1.839 




1 


1.549 


1.896 


1.889 


1.906 


1.548 


1.896 



6.4. Asymmetric predictors. In this paper, the CP theory requires the 
distribution of the predictor vector to be elliptically symmetric. However, in 
practice, such an assumption could be violated to some extent. Thus, it is 
of interest to evaluate the CP performance under nonelliptically symmetric 
distributions. To this end, we regenerate the W random vector (defined in 
Section 6.1) from a centralized standard exponential distribution, that is, 
exp(l) — 1. We then replicate the same simulation experiments as given 
in the previous subsection. Because the results are qualitatively similar to 
those in Table 1, we only report the results with df = 3 and n = 400. Table 3 



Table 2 

The percentage d = do for CP-DR with df = 3 



n 






The five models 






I 


II 


III 


IV 


V 


200 


0.996 


0.588 


0.324 


0.620 


0.658 


400 


1.000 


0.962 


0.642 


0.916 


0.904 


1000 


1.000 


1.000 


1.000 


1.000 


1.000 
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Table 3 

The average of A(Bq,B) under asymmetric predictors with (n, df) — (400, 3) 



Model 


CP-DR 


DR 


CP-SIR 


SIR 


CP-SAVE 


SAVE 


I 


0.049 


0.682 


0.034 


0.154 


0.737 


1.764 


II 


0.092 


0.691 


0.542 


0.827 


0.164 


0.822 


III 


0.153 


1.317 


0.130 


0.254 


1.164 


1.717 


IV 


0.719 


1.675 


1.307 


1.813 


0.781 


1.677 


V 


0.289 


1.859 


1.343 


1.464 


0.298 


1.862 



shows that the CP methods perform reasonably well even with asymmetric 
predictors. 

7. A real example. To demonstrate the practical usefulness of the CP 
methodology, we consider an example from the Chinese stock market. The 
dataset is obtained from the CCER database, which is one of the most au- 
thoritative commercial databases for the Chinese stock market 
(http://www.ccerdata.com/). It contains yearly accounting information 
for the firms that are publicly listed in the Chinese stock market during 
the period from 1997 to 2000. The total sample size is 2951 observations. 
The objective of this study is to understand these firms' earnings patterns, 
which can be useful information for an investment decision. To this end, the 
response is the firm's next year return on equity (ROEt). The predictors 
include the current year accounting variables: return on equity (ROE), log- 
transformed total assets (ASSET), profit margin ratio (PM), sales growth 
rate (GROWTH), leverage level (LEV) and asset turnover ratio (ATO). 
A simple calculation shows that the averaged yearly kurtosis of the afore- 
mentioned explanatory variables are given by 260.66 (ROE), 2.82 (ASSET), 
365.03 (PM), 224.98 (GROWTH), 10.04 (LEV) and 10.64 (ATO), respec- 
tively. Thus, all predictors other than ASSET have heavy-tailed distribu- 
tions, which motivates us to employ CP for estimating parameters. All pre- 
dictors are appropriately scaled so that the resulting diagonal components 
of the scatter matrix are equal to one. 

Because CP-DR yields more reliable estimates than the other methods 
in simulation studies, we employ CP-DR to analyze the data. The resulting 
structural dimension estimated by MERC is d = 2, and the first two CP-DR 
estimates are given below. 

Direction ROE ASSET PM GROWTH LEV ATO 
fa 0.936 -0.008 0.120 0.139 0.113 0.280 

$2 -0.279 -0.045 0.825 0.167 0.168 0.428 

The above estimates clearly indicate that ROE, PM and ATO are the most 
important variables associated with the firm's future earnings. To further 
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Fig. 1. Scatter plots of ROEt versus fjj (J = 1,2). 



understand their effects, Figure 1 depicts the scatter plots of Y versus 
fjj = fljlt (j = 1, 2). For a better view, we trim off an extremity of 5% of the 
observations according to the response ROEt. As a result, both plots display 
the monotonically increasing pattern for those observations with ROEt > 0. 
Practically, the first index fji can be viewed as the autocorrelation index be- 
cause the majority of the weight is loaded on the predictor ROE. Moreover, 
the second index 772 can be regarded as the ratio index because the notable 
weights are loaded on the ratio predictors PM and ATO. Consequently, the 
higher the return on equity or the larger the profit margin ratio and turnover 
ratio, the greater the yield of future return (i.e., ROEt). 

Although both CP-DR directions depict the monotonically increasing pat- 
tern when ROEt > 0, there is no clear pattern that can be identified for those 
observations with ROEt < 0. This is because firms operating in the Chinese 
stock market have extremely strong motivation to avoid reporting negative 
earnings. Otherwise, they might be subject to severe punishment from the 
China Security Regulation Commission (the government body overseeing 
the stock market). In addition, the firms with negative ROEt values are 
typically among those with relatively poor earnings capability. Thus, they 
are most likely to be involved in heavy earnings management. This induces 
a value-destroying process [Jiang and Wang (2008)] and makes the resulting 
earnings pattern (i.e., the regression relationship as in Figure 1) depart from 
the fundamental economic rules. Consequently, no clear regression pattern 
can be detected for those observations with ROEt < 0. 

8. Discussion. In this paper, we employ the contour projected approach 
to establish a new theory for sufficient dimension reduction. Our approach 
leads to the notion of GCS, which is closely related to, but very different 
from, the traditional CS. To estimate the GCS, we employ three methods, 
CP-SIR, CP-SAVE and CP-DR, via the CP theory. Monte Carlo studies 
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demonstrate that they are superior to SIR, SAVE and DR, respectively, 
especially in cases where the predictors have heavy-tailed distributions. In 
the development of CP theory, we mainly focus on population properties. 
Therefore, several important topics worth further investigation remain. The 
first avenue of research would be to further establish the asymptotic nor- 
mality of the three contour projected estimators [Zhu and Fang (1996) and 
Li and Zhu (2007)] as well as to investigate the consistency of the structural 
dimension estimator, MERC. The second area would be to extend the con- 
tour projected approach to existing dimension reduction methods such as 
dimension reduction via higher-order moments, [Yin and Cook (2002, 2003, 
2004)], dimension reduction in multivariate regressions, [Cook and Setodji 
(2003), Li et al. (2003) and Li, Wen and Zhu (2008)] and shrinkage inverse 
regressions [Ni, Cook and Tsai (2005)]. We believe that these efforts would 
enhance the usefulness of CP in dimension reduction. 

APPENDIX 

A.l. Proof of Theorem 1. 

(Sufficiency of contour asymmetric). According to Proposition 6.2 
of Cook (1998b), C y \-g is unique, if it exists. Hence, we only need to establish 
the existence of the CCS. Furthermore, applying the results of Cook (1996) 
and Cook (1998b), it is equivalent to show that the intersection [denoted by 
S(5)] of two arbitrary SCSs [denoted by S(a) and S(P)] is still a SCS. 

If S(a) C S((3) or S(a) D S(P), then S(a) n S(J3) is a SCS. Thus, we 
only need to consider S(a) <f_ <S(/3) and S(a) 7$ S(P), which indicates that 
S(S) ^ S[a) and S(S) S(P). As a result, the bases a and f3 can be further 
decomposed as a = (ai,5) and (3 = (Pi, 5), respectively, for some a\ ^ 
and pi^0. Then, define W = (Wi, W 2 , W 3 ) = (ajjt, pjjt, 5 T jt) = r] T Jt, 
where the support of it is the unit contour { af : || a?|| 2 = 1}. Consequently, 
the support of W is either a unit contour, if rank(ry) =p, or the convex set 
{io:||io|| < 1} if rank(ry) < p. This allows us to consider the two separate 
cases given below to prove that S(5) is a SCS. 

Case 1 [rank(77) <p\. Let fli^ws) = {(wi,w 2 ) :\\wi\\ 2 + \\w2W 2 < I - 
||w3|| 2 }, which is the support of the conditional distribution (Wi, W2)\Ws = 
W3. For any (w\,W2) £ ^1213(^3)1 we have ||wr|| 2 + 2 < ||u>i|| 2 + ||^2|| 2 < 
1 — 1 1 1 1 2 . Hence, (wi, 0) G ^1213(^3)- Because both S(rj) and S(a) are SCSs, 
we obtain 

G y (lt = It) = G y (Wi = wi,W 2 = w 2 ,W 3 = W 3 ) 
(A.l) =G y (Wi = wi,W 3 = w 3 ) 

= G y (Wi = wi,W 2 = 0,W 3 = w 3 ). 
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Let (io*,io|) be another arbitrary point in 12 |3 such that ||wi|| 2 + H^lP — 
1- ||^3 1| 2 . This implies that ||u>f || 2 + 2 < 1— ||w3|| 2 , which leads to (u;|,0) G 
01213(103). This, together with the fact that S((3) is a SCS, results in 

G„(Wi = w u W 2 = 0, W 3 = w 3 ) = G y (W 2 = 0, W 3 = w 3 ) 

(A.2) 

= G y (W 1 = wt,W 2 = 0,W 3 = w 3 ). 

Equations (A.l) and (A.2) yield 

(A.3) G y Qt = H>) = G y {Wx = wl W 2 = 0, W 3 = w 3 ). 

Because 5(a) is a SCS, we obtain 

(A.4) G y (Wi = w\,W 2 = 0, W 3 = w 3 ) = G y (Wi = w{,W 3 = 103), 

(A.5) G y (W l = wt,W 2 = w* 2 ,W 3 = w 3 )=G y (W 1 = w* 1 ,W 3 = w 3 ). 

By (A.l), (A.3), (A.4) and (A.5), we have 

G y (W 1 = w u W 2 = w 2 , W 3 = w 3 ) = G y (W l = wl,W 2 = w* 2 ,W 3 = w 3 ). 

This implies that G y {W\ = wi,W 2 = w 2 ,W 3 = w 3 ) is a constant function of 
(Wi, W 2 ) G n 12 \ 3 (w 3 ). As a result, 

G y Qt = ^) = G y {Wi =w u W 2 = ma, W 3 = w 3 ) 

(A.6) 

= G y (W 3 = w3) = G y (5 T Jt = 5 T lt). 

Applying Lemma 1 of Zeng and Zhu (2008), (A.6) leads to the conclusion 
that S(5) is a SCS. Note that Cook (1996, 1998b) has used this constant 
function technique to prove his Lemma 2 and Proposition 6.4, respectively. 

Case 2 [rank(r/) =p]. Let £lw = { w '■ \\wi\\ 2 + ||i02|| 2 + 1 1 tx^3 1 1 2 = 1}, which 
is a unit contour. In addition, let Wj = Wj/\\wj\\ for j = 1,...,3. Because 
Qw is a um t contour, any component of (|| Wi||, 1 1 ] | , || W3 1|) is uniquely 
determined by the other two components. Therefore, for any w G Hjy, we 
have 

Gy{W = W ) 

(A. 7) =G y (Wi =wi,W 2 = w 2 , || Will = ||ioi||, || Wall = \\w 2 \\,W 3 = w 3 ) 

= G y (W 1 =w u W 2 = w 2 , || Will = ||w?i||, W 3 = io 3 ). 

Let (w I, W2) be an another arbitrary point in fiw such that ||io* || 2 + ||io 2 || 2 + 
lliolH 2 = 1. In addition, let w* = w*j/\\w*j\\ for j = 1, ... ,3. Because 5(a) is a 
SCS, we are able to apply the same techniques as used in the proof of Case 
1 and equation (A. 7) to obtain 

G y (W = w) = G y {W 1 =w l ,W 2 = wl,\\W 1 \\ = \\w 1 \\,W 3 = w 3 ). 
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Using the fact that is a un it contour, the above equation can be ex- 
pressed as 

Gy(W = W ) 

= G y {Wx = w u W 2 = w* 2 , \\W 2 \\ = (1 - IHH 2 " IMlV 2 , W 3 = w 3 ). 

Moreover, S(f3) is a SCS. This allows us to employ the same technique used 
in the proof of Case 1 to express the above equation as 

G y {W = w) = GyiWi = w$,W 2 = w* 2 , 

(A.8) || W 2 || = (1 - IHH 2 " IHH 2 ) 1/2 , W 3 = w 3 ) 

= Gy(W 1 = wl,W 2 = wl\\W 1 \\ = \\w 1 \\,W 3 = w 3 ). 

Note that (wi,w 2 ) and are arbitrary points. Hence, applying the 

same argument used in the proof of Case 1, (A. 7) and (A.8) lead to 

(A.9) Gy(W = w)=G y (\\W 1 \\ = \\ Wl \\,W 3 = w 3 ). 

Because we assume that Y\ J? is contour asymmetric, G y (W = w) must be 
degenerate on ||Wi||. As a result, S(5) is a SCS. 

(Necessity of contour asymmetric). To show that contour asym- 
metric is a necessary condition for the existence of the CCS, it suffices to 
show that C y \-g does not exist, as long as Y\l£ is contour symmetric on 
some direction. To this end, we assume that Y\X is contour symmetric on 
direction B\. Then, there exists another direction B 2 satisfying 

(A.10) Gypt = -&) = G y {\\P Bl l\\ = \\P Bl t\\, Pb 2 J? = Pb 2 -&), 

where S(B{) n S(B 2 ) = 0, S{Bi) U S(B 2 ) / W, and G y (jt = it) is a non- 
degenerate function in HP^lzfll. 

It is easy to show that (A.10) implies that S{B) = S{B{) US(B 2 ) is a SCS. 
Furthermore, because the support of X is a unit contour, the right-hand side 
of (A.10) can be rewritten as 

GyQt = af) = GydlPflj.^11 = \\P Bt ^lP B jt = P B2 1?), 

where B^ 2 is a basis of the orthogonal subspace of S(B\) LlS(B 2 ), and 

{S(B 1 )US(B 2 )}US(B^ 2 )=M.P. 

As a result, S(B*) = S(B^ 2 ) US(B 2 ) is also a SCS of Y\lt. However, S{B*) n 

S(B) = S(B 2 ) is not a SCS. Otherwise, by (A.10), G y Qt = it) would be 
degenerate on ||Pfli"af||) which contradicts with the definition of contour 
symmetry. This completes the proof of necessity. 
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A. 2. Proof of Theorem 2. Without loss of generality, we assume that 
there are two different GCSs (<Si and S2 ) with the minimal structural dimen- 
sion do [i.e., dim(<Si) = dim(e> 2 ) = do]. Because S2, we have dim(5i n 
S2) < do- Furthermore, according to the assumption that the structural di- 
mensions of <Si and £2 are minimal, S\ n S2 cannot be a SCS. Thus, if we 
are able to show that 5i n 52 is also a SCS, then the first part of Theorem 
2 follows. Based on the definition of 1Cy\-& } every SCS must contain JC y \-^, 
which implies the following inequality: 

dim(5i U ^2) < dim(5i) + dim(c>2) — dim(/C y |-^). 

In addition, according to the condition do < {p + dim(/C J/ |-^)}/2, we have 

dim(5i) + dim (£2) — dim(/C y |- s >) 

< 2 x [{p + dim(IC yl ^)}/2] - dim(/C„|*) = p. 

The above two equations together imply that dim(<Si U S2) < p. Moreover, 
applying the same techniques used in the proof of Case 1 in Theorem 1, 
we obtain the result that S\ P1S2 is a SCS. The proof of the first part is 
complete. 

We next show the second part of Theorem 2. Let Q y \-£ be an arbitrary 
GCS. Because the CCS C y \-& exists, we have G y \-& D CyV&- On the other 
hand, the GCS is the SCS with the minimal structural dimension and the 
CCS is also a SCS. Accordingly, dim^i-^) < dim(C J/ |^), which leads to 
Q y \-g = C y \-g- This, together with the uniqueness of C y ^, implies G y \-g is 
unique. The proof of the second part is complete. 

A. 3. Proof of Theorem 3. 

Statement (1). Let B = {B : Y JL X\B T X} and B* = {B : Y _U_ 1\B T t}. 
Applying the Lemma 2 of Wang, Ni and Tsai (2008), we have B C B* . Ac- 
cordingly, 

K y ^= fl S(B) = \p\S(B)\n\ fl S(B)\ 
BeB* BeB ' ^BeB*\B ' 




This completes the proof. 

Statement (2). By Lemma 2 of Wang, Ni and Tsai (2008), we know 
that S y \ x is also a SCS. Then, by the definition of the GCS, we have that the 
structural dimension of Q y \-g cannot be larger than that of S y \ x . The proof 
is complete. 
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A. 4. Proof of Theorem 4. To prove the theorem, we consider two differ- 
ent cases, namely Y\X is contour asymmetric and dim^^u) < {p + 
dim(/Cj,|-3*)}/2. 

Case 1 is contour asymmetric). Applying Theorem 1 and the 

second part of Theorem 2, we have Gy\-g = C y \it = K-y\-&- To obtain Q y \-g = 
S y \ x , it suffices to show that K, y \-g = S y \ x . If TCy\-g = C y \-g = 0, then Y JL X . 
This implies that Y = g(R,e), where g(-) is an unknown function and e is 
independent of X and R. Accordingly, S y \ x = or S y \ x = MP. However, the 
latter situation contradicts the condition that Y\X is dimension reducible. 
Thus, we only consider the situation where dim(K, y ^) > 0. 

Let B be a basis of IC y ^ = C y \-g. Based on Theorem 3(1), it suffices to 
show that S{B) is a SDR subspace of Y\X. From Y JL Jt\B T Jl , we have 
Y _1L (Jt,R)\(P B jt,R), which is equivalent to 

G y Qt = lt,R = r) = G y (P B Jt = Pb^, R = r). 

Note that (P B X* ,R) is a one-to-one mapping of (P B X, \\P B ±X\\), where B 1 - 
is a basis satisfying S{B) US(B L ) = MP and S{B) C\S(B L ) = 0. In addition, 
, R) a one-to-one mapping of X. As a result, we obtain 

G y (X = x) = G y Qt = -&,R = r) 

= G y (P B l = P B l?,R = r) 

= G y (P B X = P B x, \\P B ±X\\ = \\P B ±x\\), 

where x = rUf. Then, Theorem 4 follows if we are able to show that G y {X = 
x) is degenerate on 

We assume that Y\X is dimension reducible. Hence, if G y {X = x) is not 
degenerate on ||Pgxx||, then G y (X = x) must be degenerate on S(B\), which 
is a subspace of S(B) with dim{5(£?i)} > 0. Accordingly, 

G y (X = x) = G y {P B2 X = P B2 x, \\P B ±X\\ = \\P B xx\\), 

where B 2 satisfies S(Bt) U S(B 2 ) = S(B) and S(Bt) n S(B 2 ) = 0. This im- 
plies that S(B*) = {S(B 2 )US(B ± )} is a SDR subspace of Y\X. Then, from 
Lemma 2 of Wang, Ni and Tsai (2008), S{B*) is also a SCS of Y \ X. Because 
S(.Bi) C S(B) and dim{5(fli)} > 0, we have JC y \^ = S(B) £ S{B*), which 
contradicts the definition of IC y ^. Consequently, G y {X = x) is degenerate 
on ||PBxa;|| and K. y \-$ = S y \ x . 

Case 2 [dim^^.) < {p + dim(/Cy|-^)}/2]. Similar to Case 1, we only 
consider a situation where dim(Q y ^) > 0. By Lemma 2 of Wang, Ni and 
Tsai (2008), we know that S y \ x is also a SCS. Then, according to the defini- 
tion of GCS, we have dim^i-^) < dim^u). As a result, we obtain Q y \ C 
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S y \ x . Otherwise, applying the condition that dim((?yi-^) < &\m(S y \ x ) < {p + 
dim(K y \-^)} /2, together with the same technique used in the proof of Theo- 
rem 2, we can show that {G y \-g n S y \ x } / G y \-gi but is a SCS. This indicates 
that the structural dimension of \Q y \ x> HiS^u} is smaller than that of G y \-£, 
which contradicts the definition of GCS. Thus, we must have G y \-& C S y \ x . 

We slightly abuse notation by letting 5 be a basis of G y \-^, which is 
different from Case 1. Following the same argument used in Case 1, we 
have G y {X = x) = G y {P B X = P B x, \\P B ±X\\ = \\P B ±x\\). Recall that Y\X is 
dimension reducible. Hence, if G y {X = x) is not degenerate on ||Pg±x||, then 
there must exist two bases B\ and E>2 such that G y (X = x) is degenerate on 
Pb 2 x, rank(5 2 ) > 0, S(B{) U S{B 2 ) = S(B), and S(B{) D S(B 2 ) = 0. As a 
result, G y (X = x) = G v (Pb 1 X = Pb x x, \\P b ±X\\ = \\P B ±x\\), which implies 
that S* =S(B 1 )US(B ± ) is a SDR subspace. Moreover, S{B 2 ) £ S* , and 
hence S(B) = G y \-g <£. S* . However, according to the definition of CS, we 
must have S y \ x C S* , and hence G y \ x C S* by combining the fact Q y \Tt C S y \ x . 
Therefore, we will get a contradiction if G y (X = x) is degenerate on Pb 2 x - 
Consequently, G y {X = x) is degenerate on which implies G y {X = 

x) = G v {PbX = Pbx), which implies that Q y \-g is also a SDR subspace. 
Combing the previous result G y \Tt C S y \ x , we have G y \-£ = S y \ x . The results 
of Cases 1 and 2 complete the proof. 

A.5. Proof of Lemma 1. Let S(B) be an arbitrary SCS. Applying Lemma 
1 of Wang, Ni and Tsai (2008), we have E{j£\B T ~k) = P B it , which shows 
that the linearity condition [Li (1991)] holds on the contour projected pre- 
dictor A^. As a result, 

E(Jt\Y) = E[E(j£\B T Jt,Y)\Y] = E[Ept\B T lt)\Y] = E(P B ^\Y) £ S(B). 

Because S(B) is an arbitrary SCS, we immediately have S{E(3?\Y)} C 
ICy^. This completes the proof. 

A.6. Proof of Lemma 2. 

Statement (1). For any S(B) G X, if Y\X* is contour asymmetric, then 
the desired result follows because S(Ba) = S(B). In contrast, if Y\ X* is con- 
tour symmetric, then S(Bg) nfC y ^ = (see the proof of necessity of contour 
asymmetric in Appendix A.l). This implies that K. y \^ = r\s(B)ei^(^ a)- 

Statement (2). For any S(B) £l, if Y\J? is contour symmetric on 
some direction B$, then IC y ^ C S(B^) (see the proof of necessity of contour 
asymmetric in Appendix A.l). It implies that JC y \-^ ^ Gy\if- Otherwise, JCy^ 
should be a SCS. Then, by previous statement, we know that S(Ba) D 
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lC y \-g. Hence S(Ba) is also a SCS. Consequently, G y (X* = it) degenerates 
on PB s lt , which contradicts the assumption that Y\ X is contour symmetric 
on i?5. 

Statement (3). If Y\~X is contour asymmetric, then Theorem 1 and 
the second part of Theorem 2 lead to fC y \-^ = Cy\-$ = QyVtf- The proof is 
complete. 

A.7. Proof of Theorem 5. Let Msir = cov{(£(A^|y)}, which is the ker- 
nel matrix of CP-SIR. By Lemmas 1 and 2, we have 5(Msir) C lC y \-g = 
Cy\-s~ = Gy\it- Next, applying the technique from the proof of Theorem 3 
in Li and Wang (2007), Theorem 5 follows if we are able to show that 
v T Msirv > for all v G Q y \-^ with ||u|| = 1. Because E(X) = 0, we obtain 

v T M sm v = v T E[E{l\Y)E(l T \Y)]v = E{[E(v T l\Y)] 2 }. 

By Assumption 1, E(v T J?\Y) is a nondegenerate function in Y. In conjunc- 
tion with Jensen's inequality, this leads to 

E{[E{v T l\Y)] 2 } > {E[E{v T l\Y)}} 2 = {v T E{1)} 2 = 0. 

Accordingly, v T M^irv > for all v G G y \-&. This completes the proof. 

A.8. Proof of Lemma 3. Using the fact that E(jf\B Q X*) = P B jt ', and 
given that S(Bq) is the GCS, we have 

cov(x'\Y) = E[cov(l\Bjx')\Y]+P Bo cov(jt\Y)P Bo . 

Let Bq denote a orthonormal basis of the subspace that is the orthogonal 
complement of S(Bq). Furthermore, let C = (Bq,Bq) so that CC T = L t 
Moreover, define 



v 



x C T . 



As a result, 

cov{l\B^X) = cov(CC T X\B^X) =Cx cov(W\W ) x C 1 

(A.H) 



[0 cov{W ± \W )_ 

By noting that \\W\\ 2 = || A*|| 2 = 1, \\Wj-\\ 2 = 1 - \\W \\ 2 = 1 - \\BjX*\\ 2 , and 
cov(W) = cov(A^) =p~ 1 I p [see Wang, Ni and Tsai (2008)], we obtain 

(A.12) cov(W\W ) = l ~ W f l p . dQ = izMJ!!j 

p — do p — do 
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Applying (A. 11) and (A. 12), we have 
"0 



cov (l\Y) = Cx 



l-E(\\Bjl\\ 2 \Y) 
Ip ' d0 



xC T + P Bo cov pt\Y)P Bo 



X(Y) 



B£(B ± ) T + P Bo cov(l\Y)P Bo 



p-d 

= t(Y)Q Bq + P Bo cov(l\Y)P Bo . 

This implies that {r(Y)/ p - cov(A*|Y)} = P Bo {r{Y)I p - co\Qt\Y)}P Bo . 
Subsequently, together with Lemma 1 and Ye and Weiss (2003), Lemma 
3, this yields 

S{r(Y)I p -E(ll T \Y)} 

= S{t(Y)I p - cov(A*|Y) + E(1\Y)E(1 T \Y)} C Q y \*. 
This completes the proof. 

A.9. Proof of Theorem 6. Let H = t(Y)I p - Ei^t T \Y), then the ker- 
nel matrix of CP-SAVE is M S ave = E(H 2 ). Because E[t{Y)] = 1/p, E(]£) = 0, 
and cov( A") = I p /p, we have E(H) = 0. Applying Lemma 3 and Ye and 
Weiss (2003), Lemma 3, we obtain 5(Msave) C Qy\-g- Next, if we are able 
to show that u T MgAVE^ > for all v G Q y \-g with ||u|| = 1, then Theorem 6 
follows (the same technique has been used in the proof of Theorem 5). It is 
easy to see that v T M SAVE v = v T E[H(I p - vv T )H]v + E[(v T Hv) 2 ]. Because 
I p — vv in a nonnegative definite matrix, the first term in the right-hand side 
of the above equation is nonnegative. By Jensen's inequality and E(H) = 0, 

(A.13) E[{v T Hv) 2 } > [E(v T Hv)} 2 = 0. 

The strict inequality in (A.13) holds because v T Hv is nondegenerate, which 
is shown below. After algebraic simplification, we have 

v T Hv = v t [t(Y)I p - E(jtjt T \Y)]v 

= r(Y) - E[(v T 1) 2 \Y] = r(Y) - 0(Y, v). 

Note that r(Y) = {1 - A(Y)}/(p - do) and 

do do 

\(Y) = E(l T B Bjjf\Y) = J2E(Po^ T MY)=J2HY,Poi). 

i=i t=i 

Without loss of generality, we assume that v = /?oi • Otherwise, we can con- 
struct a basis Bq whose first column is v. As a result, 

p-d 
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ip-do)- 1 



do 

(p-d + 1)4>{Y, A)i) + E HY, 0oi) - 1 

i=2 



which is a linear combination of 4>(Y,[3oi), i = l,...,do. According to As- 
sumption 3, v T Hv is nondegenerate. This completes the proof. 

A.10. Proof of Theorem 7. Because (Y*, jt*) is an independent copy of 
(V, X), we have 

A(Y,Y*) = E[X*Jf T - 1(1*) T - Jt*l T + 1*(1*) T \Y,Y*] 

(A.14) = E[XX T \Y] - E[X\Y]E[(X*) T \Y*] 

-E[jt*\Y*]E[lt T \Y]+E[]t*(jt*) T \Y*]. 

Using (A.14), we further obtain 

[t(Y) + t(Y*)]I p -A(Y,Y*) 

(A.15) = {t(Y)I p - E(X*l r \Y)} + {t(Y*)I p - ^[A^*(A^*) T |y*]} 

+ {£[A*|Y~]£[(A^) T |Y*] + E[X**\Y*]E[1 T \Y}}. 

Equation (A.15), in conjunction with Lemmas 1 and 3, yields the desired 
result. 

A.ll. Proof of Theorem 8. Let D = [t(Y) + t(Y*)]I p - A(Y, Y*) so that 
M D r = E(D 2 ). Note that E[r(Y)} = 1/p, E(j£) = 0, cov(A^) = I p /p, and 
(X*,Y*) is an independent copy of (X,Y). Then applying the results of 
(A.14) and (A.15), we are able to show that E[A(Y, Y*)] = 2I p /p and E(D) = 
0. Furthermore, employing Theorem 7 together with Ye and Weiss (2003), 
Lemma 3, we have 5(Mdr) C Q y \ -g. Accordingly, the theorem follows if we 
can show that v T M^v > for all v G Q y \ with ||v|| = 1. It is easy to see 
that 

v T M DK v = v T E[D{I p - vv T )D]v + E[(v T Dv) 2 }. 

Because I p — vv T is nonnegative definite matrix, the first term in the right- 
hand side of the above equation is nonnegative. By Jensen's inequality and 
E{D) = 0, we have 

(A.16) E[(v T Dvf\ = vav(v T Dv) > [E(v T Dv)} 2 = 0. 

The strict inequality in (A.16) holds because v T Dv is nondegenerate, which 
is shown next. Let ip(Y,v) = E(v T Jl\Y). Then, (A.15) implies that 

v t Dv = t(Y) + t(Y*) - (j)(Y,v) - cp(Y*,v) + 2(p(Y,v)(p(Y*,v). 
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As noted in the proof of Theorem 8, we have t(Y) = {1 — \(Y)}/(p — do) 
and X(Y) = E(l T B Bjl\Y) = E(P^l T p 0i \Y) = E£i<H^Am). 
Without loss of generality, we also assume that v = /3qi • Accordingly, 



v T Dv 



p-d p-d 
- 0(Y,A)i) - 0(Y*. A)l) + 2<p(Y,p i)<p(Y*,p i) 



ip-do)' 1 ' 



(p- do + !)[<!>(¥, ft 



01, 



P 



-ii 



do 



+ ^[</»(y,/3 0i )-p" 1 ] 



i=2 



+ 



(p-do + lM^Y-.fti)-?- 1 ] 



do 



+ ^[^(y*,/3 0l )-p- 1 ] 

+ 2^(y,/3oiMF*,/3 i) 

= -(p - do)" 1 ^ 00 + H(Y*)} + 2^(y,/3oi)^(y*,/3 i), 

where H(Y) = (p - d + 1)^, An) - P" 1 ] + E£ 2 [<K^Am) - P" 1 ]. It is 
easy to see that E{^(y,/3 i)} = l/p, E{<p(Y,Pi)} = 0, and £?{i?(Y)} = 0. In 
addition, 

oov[£T(r), v (y, 0<^»] = £[#(yMy, ^(y*, «)] 

= E[H(Y)<p(Y,v)]E[<p(Y*,v)]=0. 

Hence, we have 

(P - dor 
+ 4var^(y,/3oi)]var[^(y*,/3 i)] 

= «P + 4 { var b( y, /5oi )]} 2 . 



(A.17) 



If Assumption 1 holds and Y| A is contour asymmetric, we then have y(Y, /?oi) 
nondegenerate. Thus, the second term in (A.17) is strictly positive. Other- 
wise, if Assumption 2 holds and Y\X is dimension reducible, we then have 
H(Y) nondegenerate. Therefore, the first term in (A.17) is strictly positive. 
As a result, v&r(v T Dv) > 0, or equivalently, v T Dv is nondegenerate. This 
completes the proof. 
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A.12. Proof of Theorem 9. Because (Y*,Jt*) is an independent copy of 
(Y,3t) and E[t(Y)] = 1/p, we have 

M DR = E{[[r(Y) + r(Y*)]I p - A(Y,Y*)} 2 } 

(A.18) = {2E{[r(Y)] 2 } + 2p~ 2 }I p 

- AE{t(Y)A(Y,Y*)} + E{[A(Y,Y*)] 2 }. 

Using the result that E(X) = and cov(A^) = I p /p, we can simplify E{t(Y) x 
A(Y, Y*)} in the above equation as follows: 

E[t(Y)A(Y,Y*)} 

= E[t(Y)E(H t \Y)} + E[t(Y)]E[1*(x'*) t ] 

(A.19) 

- E[t(Y)E(1\Y)]E[(1*) t ] - E{H*)E[t{Y)E{H t \Y)} 
= E[r(Y)E(jtjt T \Y)}+p~ 2 I p . 
Furthermore, the third term of (A.18) can be reduced to 
E{[A(Y,Y*)} 2 } 

(A.20) = 2{E[[E{11 T \Y)] 2 } + [E[E{1\Y)E{H T \Y)]) 2 

+ E[E(l T \Y)E(lt\Y)}E[E(l\Y)E(l T \Y)]+p- 2 I p }. 
Substituting (A.20) and (A.19) into (A.18), we obtain 

M DR = 2{E[t 2 (Y)]I p + E[E 2 (11 T \Y)] + E 2 [E{1\Y)E{1 T \Y)} 
+ E[E(X" T \Y)E(X'\Y)}E[E{X'\Y)E(X' T \Y)} 

-2E[t(Y)E{H t \Y)}}. 

This completes the proof. 
A.13. Proof of Theorem 10. 

Statement (1). According to Assumption 3, we are able to find an 
a > 0, such that h(t)t~ a < 2C a for < t < a. Thus, we have 

roc 

E\\X\\~*= / h{t)t- 2 dt 
Jo 

pa 

h{t)t~ 2 dt + h{t)r a xH 2 ~ a ) dt 



(A.21) 

r°° r a 

< / h(t)a~ 2 dt+ / 2C a t~ i2 - a) dt 

J a JO 

poo pa 

< a~ 2 / h{t) dt + 2C a / t~ {2 - a) dt. 

Jo Jo 
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Because h(t) is a probability density function, the first term on the right- 
hand side of (A. 21) is finite. By Assumption 3, we have a > 1. Thus, the 
second term on the right-hand side of (A. 21) is also finite. Consequently, we 
have .E||X|| -4 < oo. This completes the proof. 

Statement (2). By the definition of || • ||s-norm, one can easily verify 
that 

2 



max 

Ki<n 



\Xi - [i\ 



1 



.1 j 



max 

Ki<n 



m\-2x]t- 1 ^ 



|A||| + 2KTs- 1 /i 



< max ■ 

Ki<n 



< max ■ 

Ki<n 



|/2||| +2\\xi\\ t x 



< IIS" 1 !! x IISIN max 



IAII 2 



l<i<n \\Xs\ 



x IISI 



min,- \\x 



-2(||S- 1 ||x||S||)^ 
+ 2(||S- 1 ||x||S||) 1 / 2 



max 

l<j<n ||x, 

min,- llxill 



Therefore, Statement (2) follows if we are able to show that ||/i|| 2 / 
{minj ||xj|| 2 } — > p 0. To this end, we compute the following quantity: 

P( min IHI 2 > cn" 1 /^ 1 )) = F n (||x;|| 2 > cn" 1 /^ 1 )) 

= {l-P(\\x l \\ 2 <cn~ l ^ a+1 ^} n 



(A.22) 



cn -i/( a +i) 



o 



h(t) dt 



1 



t- a h(t) xt a dt 



where c > is an arbitrary constant. By Assumption 3, we have t~ a h(t) > 
C a /2 when n is large enough. As a result, the right-hand side of (A.22) is 
bounded by 



(A.23) 



1 



~2~ Jo 



cn -i/( a +i) 



J V 2(a + l) J 



exp 



2(a + l) J' 
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Moreover, for any arbitrarily small e > 0, we can find a sufficiently large c 
such that the right-hand side of (A. 23) is smaller than e/2. Hence, we have 
that 



limsupPI min 

n. — >no Vl<j<n 



\xA\ 2 > cn' 1 /^ 



rwoo \l<i<n 

which implies mini<j< n ||xj|| 2 = O p (n _1 /( Q+1 )). This together with the fact 
||/}|| =O p (n~ 1 / 2 ) leads to 

Il/2||2 n - = OJn" 1 x n 1 /^ 1 ') = OJn- a '^) = oJl). 

mmi<j< n \\xi\\ z 

The proof is complete. 

A.14. Proof of Theorem 11. Define gj(t) = n^ 1 Zik(xij — tp,j)\\xi — 
tfaW^ , where < t < 1 and j = 1, . . . ,p. Then the jth component of n^ 1 z ik X 
{xi — fi)/\\xi — /1|U is Because the first-order derivative of gj(t) } gj(-), 

is a continuous function of t, there must exist a < t < 1 such that — 
5j(0) =9j{t), where 



(A.24) 



ffi(*) = -^{ n fe 1 ^ 



+ n k 1 J2z ik (x ij -tfij] 



'(^-^IT 1 /}. 



Let p, = tjl = (fii, . . . ,jj. p ) T G M p . Then, by (A.24), we obtain 



i=l 



(A.25) 



|^(1) - ft (0)| = \ 9j (t)\ 



: 1 E 



1=1 



e j (Xjj j^j)^- 1 { x i A) 



\xi-fi\\l 



/' 



for 1 < j < p, where ej is defined in Section 2.3. After simple calculations, 
the right-hand side of (A.25) is bounded by 



n 
i=l 



I Ail 



+ 



Aj| x |A T £ x (xj - A)| 

IN-All| 



(A.26) 



n r 

" k tMn\- 1/2 \\^ 



+ 



+ 



x \\ Xi - Alls 



IS" 1 



||S||-3/2||^-A||3 
X 



Jl||S||-iA|| x< _ A || ||S||-3/2||^_ A || 
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- x HAH x ||s||V 2 



i=l 



Obviously, n^jn = O p (l). Because E — > p E, we have both HEf/ 2 = O p {\) 
and ||E _1 / 2 || =O p (l). These results together with \\fi\\ = O p (ra -1 / 2 ) imply 
that the right-hand side of (A. 26) is O p (n -1 / 2 ) if we are able to show that 
n_1 S \\ x i ~ All" 1 = O p (l). To this end, we compute 



n 

i=i 



(A.27) 



n 

-1 -IV 1 II II- 1 

1 1 — l ' I 



i=l 



< 



max 

\l<i<n 



x n 



i=l 



By Theorem 10(h), the first term on the right-hand side of (A.27) is o p (l). In 
addition, by Theorem 10(i) and the Law of Large Numbers, the second term 
on the right-hand side of (A.27) is O p (l). As a result, the right-hand side 
of (A.27) is o p (l). This together with the result of ra _1 £r=i INI -1 = °v( l ) 
means that the last term on the right-hand side of (A. 26) is O p (l). Hence, 
the left-hand side of (A. 25) is O p (n~ 1 / 2 ). Next, consider 



(A.28) 



i=l 



X 



Fills 



X j 



n 



n k 1 J2 z 'ik 
i=l 

n 

4 = 1 

n 



II s II 1 



|| "I - H*^ lis 



H~ 



i=l 



1 1 If} || X 1 1 X q 

< — jj — jj 1 — 3 

|| H~ ||*^i||s 

\\M\ 1/2 x Ikill 



|^|| + ||S||-V2|| Xi 



E^HxK^xllEll^l + llSir 1 ^}-!. 
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By Theorem 4.2 of Tyler (1987), we have \\I p - XT 1 ]] = O^n" 1 / 2 ). This 
implies that ||X|| -1 / 2 = O p (l). Furthermore, nT^n = O p (l). Accordingly, the 
right-hand side of (A. 28) is of order O p (n -1 / 2 ). This indicates that the left- 
hand side of (A. 28) is also of order O p (n" 1 / 2 ). This in conjuction with (A. 25) 
implies that 

(A.29) n^±zj-^- - ^}=O p (n-W). 

i=l II ' Mils ll*^i|| J 

Because the Z/2- n orm of is 1, we then apply the central limit theo- 

rem to have 1 ^2 z^Xi/Wxi || = E(lt\Y = k) + Op(n~ 1 / 2 ). This result to- 
gether with (A.29) shows that n^ 1 z ik(xi — P>)/\\%i — Alls 
k) + O p {n~ 1 / 2 ). Finally, using the fact that £ — I p = O p {n~ 1 / 2 ), we have 

~x k = t" 1 ' 2 - £ Zik ( , Xi ~t ) = E{1\Y = k) + O p {n-^). 

This completes the proof. 
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